10 research outputs found

    Co-Scheduling Algorithms for High-Throughput Workload Execution

    Get PDF
    This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem, and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to to faster workload completion time and to faster response times on average (hence increasing system throughput and saving energy), for significant benefits over traditional scheduling from both the user and system perspectives

    Co-scheduling algorithms for high-throughput workload execution

    Get PDF
    International audienceThis paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several applications concurrently. We partition the original application set into a series of packs, which are executed one by one. A pack comprises several applications, each of them with an assigned number of processors, with the constraint that the total number of processors assigned within a pack does not exceed the maximum number of available processors. The objective is to determine a partition into packs, and an assignment of processors to applications, that minimize the sum of the execution times of the packs. We thoroughly study the complexity of this optimization problem , and propose several heuristics that exhibit very good performance on a variety of workloads, whose application execution times model profiles of parallel scientific codes. We show that co-scheduling leads to faster workload completion time (40% improvement on average over traditional scheduling) and to faster response times (50% improvement). Hence co-scheduling increases system throughput and saves energy, leading to significant benefits from both the user and system perspectives

    Co-scheduling algorithms for cache-partitioned systems

    Get PDF
    Cache-partitioned architectures allow subsections of theshared last-level cache (LLC) to be exclusively reserved for someapplications. This technique dramatically limits interactions between applicationsthat are concurrently executing on a multi-core machine. Consider n applications that execute concurrently, with the objective to minimize the makespan, defined as the maximum completion time of the n applications.Key scheduling questions are: (i) which proportionof cache and (ii) how many processors should be given to each application?Here, we assign rational numbers of processors to each application,since they can be shared across applications through multi-threading.In this paper, we provide answers to (i) and (ii) for perfectly parallel applications.Even though the problem is shown to be NP-complete, we give key elements to determinethe subset of applications that should share the LLC(while remaining ones only use their smaller private cache). Building upon these results,we design efficient heuristics for general applications.Extensive simulations demonstrate the usefulness of co-schedulingwhen our efficient cache partitioning strategies are deployed

    Co-scheduling Amdahl applications on cache-partitioned systems

    Get PDF
    Cache-partitioned architectures allow subsections of theshared last-level cache (LLC) to be exclusively reserved for someapplications. This technique dramatically limits interactions between applicationsthat are concurrently executing on a multi-core machine.Consider n applications that execute concurrently, with the objective to minimize the makespan,defined as the maximum completion time of the n applications.Key scheduling questions are: (i)which proportionof cache and (ii) how many processors should be given to each application? In this paper, we provide answers to (i) and (ii) for Amdahl applications.Even though the problem is shown to be NP-complete, we give key elements to determinethe subset of applications that should share the LLC(while remaining ones only use their smaller private cache). Building upon these results,we design efficient heuristics for Amdahl applications. Extensive simulations demonstrate the usefulness of co-schedulingwhen our efficient cache partitioning strategies are deployed.Les architectures à partitionnement de cache permettent d'allouer des portions dudernier niveau de cache (LLC) exclusivement réservées à certaines applications. Cette techniquepermet de réduire drastiquement les interactions entre applications qui sont exécutées simultanément sur un machine multi-coeurs. Considérons n applications exécutées simultanémentavec l'objectif de minimiser le makespan, défini comme le maximum des temps de complétionsparmi les n applications. Les problèmes d'ordonnancement sont les suivants: (i) quelle proportionde cache et (ii) combien de processors doivent être alloués à chaque application. Ici, nousassignons des nombres de processeurs rationnels pour chaque application, pour qu'ils puissentêtre partagés parmi les applications grâce au multi-threading. Dans ce travail, nous fournissonsdes réponses aux questions (i) et (ii) pour des applications parfaitement parallèles. Malgré cela,le problème est prouvé être NP-complet, et nous donnons des éléments clés pour déterminer lesous-ensemble des applications qui doivent partager le dernier niveau de cache (tandis que lesautres utilisent seulement leur petit cache privée). Basé sur ces résultats, nous développons desheuristiques efficaces pour des profils d'applications généraux. Un ensemble complet de simulationsdémontre l'utilité de l'ordonnancement concurrent quand les techniques de partitionnementde cache sont mises en plac
    corecore